Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 95
Filter
1.
Proceedings - 2022 13th International Congress on Advanced Applied Informatics Winter, IIAI-AAI-Winter 2022 ; : 181-188, 2022.
Article in English | Scopus | ID: covidwho-20243412

ABSTRACT

On social media, misinformation can spread quickly, posing serious problems. Understanding the content and sensitive nature of fake news and misinformation is critical to prevent the damage caused by them. To this end, the characteristics of information must first be discerned. In this paper, we propose a transformer-based hybrid ensemble model to detect misinformation on the Internet. First, false and true news on Covid-19 were analyzed, and various text classification tasks were performed to understand their content. The results were utilized in the proposed hybrid ensemble learning model. Our analysis revealed promising results, establishing the capability of the proposed system to detect misinformation on social media. The final model exhibited an excellent F1 score (0.98) and accuracy (0.97). The AUC (Area Under The Curve) score was also high at 0.98, and the ROC (Receiver Operating Characteristics) curve revealed that the true-positive rate of the data was close to one in this model. Thus, the proposed hybrid model was demonstrated to be successful in recognizing false information online. © 2022 IEEE.

2.
ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023 ; : 1204-1207, 2023.
Article in English | Scopus | ID: covidwho-20239230

ABSTRACT

Timeline summarization (TLS) is a challenging research task that requires researchers to distill extensive and intricate temporal data into a concise and easily comprehensible representation. This paper proposes a novel approach to timeline summarization using Meaning Representations (AMRs), a graphical representation of the text where the nodes are semantic concepts and the edges denote relationships between concepts. With AMR, sentences with different wordings, but similar semantics, have similar representations. To make use of this feature for timeline summarization, a two-step sentence selection method that leverages features extracted from both AMRs and the text is proposed. First, AMRs are generated for each sentence. Sentences are then filtered out by removing those with no named-entities and keeping the ones with the highest number of named-entities. In the next step, sentences to appear in the timeline are selected based on two scores: Inverse Document Frequency (IDF) of AMR nodes combined with the score obtained by applying a keyword extraction method to the text. Our experimental results on the TLS-Covid19 test collection demonstrate the potential of the proposed approach. © 2023 ACM.

3.
CEUR Workshop Proceedings ; 3395:346-348, 2022.
Article in English | Scopus | ID: covidwho-20239057

ABSTRACT

Classification is a vital work to human beings in day today life as it breaks down complex subjects. In the same way, text classification is very important to understand and realize the subject of the text. © 2021 Copyright for this paper by its authors.

4.
ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023 ; : 1020-1029, 2023.
Article in English | Scopus | ID: covidwho-20238654

ABSTRACT

The COVID-19 pandemic has had a profound impact on the global community, and vaccination has been recognized as a crucial intervention. To gain insight into public perceptions of COVID-19 vaccines, survey studies and the analysis of social media platforms have been conducted. However, existing methods lack consideration of individual vaccination intentions or status and the relationship between public perceptions and actual vaccine uptake. To address these limitations, this study proposes a text classification approach to identify tweets indicating a user's intent or status on vaccination. A comparative analysis between the proportions of tweets from different categories and real-world vaccination data reveals notable alignment, suggesting that tweets may serve as a precursor to actual vaccination status. Further, regression analysis and time series forecasting were performed to explore the potential of tweet data, demonstrating the significance of incorporating tweet data in predicting future vaccination status. Finally, clustering was applied to the tweet sets with positive and negative labels to gain insights into underlying focuses of each stance. © 2023 ACM.

5.
CEUR Workshop Proceedings ; 3396:118-129, 2023.
Article in English | Scopus | ID: covidwho-20236466

ABSTRACT

Since the beginning of the global Covid-19 pandemic, text media materials are full of the word "vax”, and after the appearance of vaccines against the coronavirus and the start of the vaccination campaign around the world, "anti-vax” has also been added. In the article, it is singled out the linguistic means of updating the evaluation in the headlines and leads of the text media of Ukraine in the materials dedicated to opponents of vaccination against Covid-19, and the possibility of its automatic recognition with the help of machine methods is also considered. It was found that among the language means of expressing assessment, colloquial vocabulary (jargonisms and slang) and phraseology come to the fore. © 2023 Copyright for this paper by its authors.

6.
ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023 ; : 1004-1013, 2023.
Article in English | Scopus | ID: covidwho-20233356

ABSTRACT

Humor is a cognitive construct that predominantly evokes the feeling of mirth. During the COVID-19 pandemic, the situations that arouse out of the pandemic were so incongruous to the world we knew that even factual statements often had a humorous reaction. In this paper, we present a dataset of 2510 samples hand-annotated with labels such as humor style, type, theme, target and stereotypes formed or exploited while creating the humor in addition to 909 memes. Our dataset comprises Reddit posts, comments, Onion news headlines, real news headlines, and tweets. We evaluate the task of humor detection and maladaptive humor detection on state-of-the-art models namely RoBERTa and GPT-3. The finetuned models trained on our dataset show significant gains over zero-shot models including GPT-3 when detecting humor. Even though GPT-3 is good at generating meaningful explanations, we observed that it fails to detect maladaptive humor due to the absence of overt targets and profanities. We believe that the presented dataset will be helpful in designing computational methods for topical humor processing as it provides a unique sample set to study the theory of incongruity in a post-pandemic world. The data is available to research community at https://github.com/smritae01/Covid19-Humor. © 2023 ACM.

7.
CEUR Workshop Proceedings ; 3395:361-368, 2022.
Article in English | Scopus | ID: covidwho-20232900

ABSTRACT

Determining sentiments of the public with regard to COVID-19 vaccines is crucial for nations to efficiently carry out vaccination drives and spread awareness. Hence, it is a field requiring accurate analysis and captures the interest of many researchers. Microblogs from social media websites such as Twitter sometimes contain colloquial expressions or terminology difficult to interpret making the task a challenging one. In this paper, we propose a method for multi-label text classification for the track of”Information Retrieval from Microblogs during Disasters (IRMiDis)” presented by the”Forum of Information Retrieval Evaluation” in 2022, related to vaccine sentiment among the public and reporting of someone experiencing COVID-19 symptoms. The following methodologies have been utilised: (i) Word2Vec and (ii) BERT, which uses contextual embedding rather than the fixed embedding used by conventional natural language models. For Task 1, the overall F1 score and Accuracy are 0.503 and 0.529, respectively, placing us fourth among all the teams, while for Task 2, they are 0.740 and 0.790, placing us second among all the teams who submitted their work. Our code is openly accessible through GitHub. 1 © 2022 Copyright for this paper by its authors.

8.
2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2022 ; : 751-754, 2022.
Article in English | Scopus | ID: covidwho-2327440

ABSTRACT

Recent studies in machine learning have demonstrated the effectiveness of applying graph neural networks (GNNs) to single-cell RNA sequencing (scRNA-seq) data to predict COVID-19 disease states. In this study, we propose a graph attention capsule network (GACapNet) which extracts and fuses Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) transcriptomic patterns to improve node classification performance on cells and genes. Significantly different from the existing GNN approaches, we innovatively incorporate a capsule layer with dynamic routing into our model architecture to combine and fuse gene features effectively and to allow those more prominent gene features present in the output. We evaluate our GACapNet model on two scRNA-seq datasets, and the experimental results show that our GACapNet model significantly outperforms state-of-the-art baseline models. Therefore, our study demonstrates the capability of advanced machine learning models to generate predictive features and evolutionary patterns of the SARS-CoV-2 pathogen, and the applicability of closing knowledge gaps in the pathogenesis and recovery of COVID-19. © 2022 IEEE.

9.
7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022 ; : 1-10, 2022.
Article in English | Scopus | ID: covidwho-2290872

ABSTRACT

Named Entity Recognition (NER) is a well-known problem for the natural language processing (NLP) community. It is a key component of different NLP applications, including information extraction, question answering, and information retrieval. In the literature, there are several Arabic NER datasets with different named entity tags;however, due to data and concept drift, we are always in need of new data for NER and other NLP applications. In this paper, first, we introduce Wassem, a web-based annotation platform for Arabic NLP applications. Wassem can be used to manually annotate textual data for a variety of NLP tasks: text classification, sequence classification, and word segmentation. Second, we introduce the COVID-19 Arabic Named Entities Recognition (CAraNER) dataset extracted from the Arabic Newspaper COVID-19 Corpus (AraNPCC). CAraNER has 55,389 tokens distributed over 1,278 sentences randomly extracted from Saudi Arabian newspaper articles published during 2019, 2020, and 2021. The dataset is labeled by five annotators with five named-entity tags, namely: Person, Title, Location, Organization, and Miscellaneous. The CAraNER corpus is available for download for free. We evaluate the corpus by finetuning four BERT-based Arabic language models on the CAraNER corpus. The best model was AraBERTv0.2-large with 0.86 for the F1 macro measure. © 2022 Association for Computational Linguistics.

10.
ACM Transactions on Internet Technology ; 23(1), 2023.
Article in English | Scopus | ID: covidwho-2306388

ABSTRACT

The outbreak of Covid-19 has exposed the lack of medical resources, especially the lack of medical personnel. This results in time and space restrictions for medical services, and patients cannot obtain health information all the time and everywhere. Based on the medical knowledge graph, healthcare bots alleviate this burden effectively by providing patients with diagnosis guidance, pre-diagnosis, and post-diagnosis consultation services in the way of human-machine dialogue. However, the medical utterance is more complicated in language structure, and there are complex intention phenomena in semantics. It is a challenge to detect the single intent, multi-intent, and implicit intent of a patient's utterance. To this end, we create a high-quality annotated Chinese Medical query (utterance) dataset, CMedQ (about 16.8k queries in medical domain which includes single, multiple, and implicit intents). It is hard to detect intent on such a complex dataset through traditional text classification models. Thus, we propose a novel detect model Conco-ERNIE, using concept co-occurrence patterns to enhance the representation of pre-trained model ERNIE. These patterns are mined using Apriori algorithm and will be embedded via Node2Vec. Their features will be aggregated with semantic features into Conco-ERNIE by using an attention module, which can catch user explicit intents and also predict user implicit intents. Experiments on CMedQ demonstrates that Conco-ERNIE achieves outstanding performance over baseline. Based on Conco-ERNIE, we develop an intelligent healthcare bot, MedicalBot. To provide knowledge support for MedicalBot, we also build a Chinese medical graph, CMedKG (about 45k entities and 283k relationships). © 2023 Association for Computing Machinery.

11.
2023 International Conference on Intelligent Systems, Advanced Computing and Communication, ISACC 2023 ; 2023.
Article in English | Scopus | ID: covidwho-2305549

ABSTRACT

With the advancement in technology, web technol-ogy in the form of social media is one of the main origins of information worldwide. Web technology has helped people to enhance their ability to know, learn, and gain knowledge about things around them. The benefits that technological advancement offers are boundless. However, apart from these, social media also has major issues related to problems and challenges concerning filtering out the right information from the wrong ones. The sources of information become highly unreliable at times, and it is difficult to differentiate and decipher real news or real information from fake ones. Cybercrime, through fraud mechanisms, is a pervasive menace permeating media technology every single day. Hence, this article reports an attempt to fake news detection in Khasi social media data. To execute this work, the data analyzed are extracted from different Internet platforms mainly from social media articles and posts. The dataset consists of fake news and also real news based on COVID-19, and also other forms of wrong information disseminated throughout the pandemic period. We have manually annotated the assembled Khasi news and the data set consists of 116 news data. We have used three machine learning techniques in our experiment, the Decision Tree, the Logistic Regression, and the Random Forest approach. We have observed in the experimental results that the Decision Tree-based approach yielded accurate results with an accuracy of 87%, whereas the Logistic Regression approach yielded an accuracy of 82% and the Random Forest approach yielded an accuracy of 75%. © 2023 IEEE.

12.
38th International Conference on Computers and Their Applications, CATA 2023 ; 91:124-137, 2023.
Article in English | Scopus | ID: covidwho-2304334

ABSTRACT

On social media, false information can proliferate quickly and cause big issues. To minimize the harm caused by false information, it is essential to comprehend its sensitive nature and content. To achieve this, it is necessary to first identify the characteristics of information. To identify false information on the internet, we suggest an ensemble model based on transformers in this paper. First, various text classification tasks were carried out to understand the content of false and true news on Covid-19. The proposed hybrid ensemble learning model used the results. The results of our analysis were encouraging, demonstrating that the suggested system can identify false information on social media. All the classification tasks were validated and shows outstanding results. The final model showed excellent accuracy (0.99) and F1 score (0.99). The Receiver Operating Characteristics (ROC) curve showed that the true-positive rate of the data in this model was close to one, and the AUC (Area Under The Curve) score was also very high at 0.99. Thus, it was shown that the suggested model was effective at identifying false information online. © 2023, EasyChair. All rights reserved.

13.
55th Annual Hawaii International Conference on System Sciences, HICSS 2022 ; 2022-January:2971-2980, 2022.
Article in English | Scopus | ID: covidwho-2303216

ABSTRACT

In recent years, automated political text processing became an indispensable requirement for providing automatic access to political debate. During the Covid-19 worldwide pandemic, this need became visible not only in social sciences but also in public opinion. We provide a path to operationalize this need in a multi-lingual topic-oriented manner. Using a publicly available data set consisting of parliamentary speeches, we create a novel process pipeline to identify a good reference model and to link national topics to the cross-national topics. We use design science research to create this process pipeline as an artifact. © 2022 IEEE Computer Society. All rights reserved.

14.
IEEE Access ; 11:30575-30590, 2023.
Article in English | Scopus | ID: covidwho-2301709

ABSTRACT

Social networks and other digital media deal with huge amounts of user-generated contents where hate speech has become a problematic more and more relevant. A great effort has been made to develop automatic tools for its analysis and moderation, at least in its most threatening forms, such as in violent acts against people and groups protected by law. One limitation of current approaches to automatic hate speech detection is the lack of context. The spotlight on isolated messages, without considering any type of conversational context or even the topic being discussed, severely restricts the available information to determine whether a post on a social network should be tagged as hateful or not. In this work, we assess the impact of adding contextual information to the hate speech detection task. We specifically study a subdomain of Twitter data consisting of replies to digital newspapers posts, which provides a natural environment for contextualized hate speech detection. We built a new corpus in Spanish (Rioplatense variant) focused on hate speech associated to the COVID-19 pandemic, annotated using guidelines carefully designed by our interdisciplinary team. Our classification experiments using state-of-the-art transformer-based machine learning techniques show evidence that adding contextual information improves the performance of hate speech detection for two proposed tasks: binary and multi-label prediction, increasing their Macro F1 by 4.2 and 5.5 points, respectively. These results highlight the importance of using contextual information in hate speech detection. Our code, models, and corpus has been made available for further research. © 2013 IEEE.

15.
2022 International Conference on Electrical Engineering and Sustainable Technologies, ICEEST 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2297523

ABSTRACT

COVID-19 is one the most lethal virus, causing millions of death to date. It was initially detected in Wuhan, China. It then spread rapidly around the globe, which resultantly created major setbacks in the public health sector. The reason of millions of deaths is not only due to the virus itself but it is also linked to peoples' mental state, and sentiments triggered by the fear of the virus. These sentiments are predominantly available on posts/tweets on social media. This paper presents a novel approach for exploratory data analysis of twitter to understand the emotions of general public;country wise, and user wise. Firstly K-Means clustering is employed for topic modeling to categorize the emotions in each tweet. Further supervised machine learning techniques are used to categorize the multi-label tweets. This research concluded that Fear was the most common emotion in twitter discussion. Furthermore, we classified the dataset by performing decision tree (DT), logistic regression (LR), and support vector machine (SVM), finally this paper concluded the results of classification, which shows that SVM can attain better classification accuracy (99%) for COVID-19 text classification. © 2022 IEEE.

16.
Lecture Notes in Networks and Systems ; 635 LNNS:339-344, 2023.
Article in English | Scopus | ID: covidwho-2294623

ABSTRACT

Due to their need to be connected to the rest of the world, people started to use social networks extensively to share their feelings and be informed, especially during the Covid-19 pandemic and its lockdown. The tremendous growth of content in social media increased the frequency of researchers' work on natural language understanding, text classification, and information retrieval. Unfortunately, not all languages have benefited equally from this interest. Arabic is an example of such languages. The main reason behind this gap is the limited number of datasets that addressed Covid-19-related topics. To this aim, we performed the first-of-its-kind systematic review that covered, to the best of our knowledge, the most Arabic Covid-19 datasets freely available or access granted upon request. This paper presents these 15 datasets alongside their features and the type of analysis conducted. The general concern of the authors is to direct researchers to reliable and freely available datasets that advance the progress of Arabic Covid-19-related studies. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

17.
1st International Conference on Machine Learning, Computer Systems and Security, MLCSS 2022 ; : 301-306, 2022.
Article in English | Scopus | ID: covidwho-2294226

ABSTRACT

The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up. So we are working on COVID-19 dataset on Omicron variant to recognise the name entity from a given text. We collect the COVID related data from newspaper or from tweets. This article covered the name entity like COVID variant name, organization name and location name, vaccine name. It include tokenisation, POS tagging, Chunking, levelling, editing and for run the program. It will help us to recognise the name entity like where the COVID spread (location) most, which variant spread most (variant name), which vaccine has been given (vaccine name) from huge dataset. In this work, we have identified the names. If we assume unemployment, economic downfall, death, recovery, depression, as a topic we can identify the topic names also, and in which phase it occurred. © 2022 IEEE.

18.
5th International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2022 ; : 565-569, 2022.
Article in English | Scopus | ID: covidwho-2277252

ABSTRACT

Radiology is used as an important assessment for patients with pulmonary disease. The radiology images are usually accompanied by a written report from a radiologist to be passed to the other referring physicians. These radiology reports are written in a natural language where they can have different systematic structures based on the language used. In our study, the radiology reports were collected from an Indonesian hospital and written in Bahasa Indonesia. We performed an automatic text classification to differentiate the information written in the radiology reports into two classes, COVID-19 and non COVID-19. To find the best model, we evaluated several embedding techniques available for Bahasa and five Machine Learning (ML) models, namely (1) XGBoost, (2) fastText, (3) LSTM, (4) Bi-LSTM and (5) IndoBERT. The result shows that IndoBERT outperformed the others with an accuracy of 98%. In terms of training speed, the shallow neural network architecture implemented with the fastText library can train the model in under one second and still result in a reasonably good accuracy of 86%. © 2022 IEEE.

19.
19th IEEE India Council International Conference, INDICON 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2261610

ABSTRACT

In the course of the recent pandemic, we have witnessed non-clinical approaches such as data mining and artificial intelligence techniques being exceedingly utilized to restrain and combat the increase of COVID-19 across the globe. The emergence of artificial intelligence in the medical field has helped in reducing the immense burden on medical systems by providing the best means for diagnosis and prognosis of COVID-19. This work attempts to analyze & evaluate superlative models on robust data resources on symptoms of COVID-19, consisting of age, gender, demographic information, pre-existing medical conditions, and symptoms experienced by patients. This study establishes paradigmatic pipeline of supervised learning algorithms coupled with feature extraction techniques and surpassing the current state-of-the-art results by achieving an accuracy of 93.360. The optimal score was found by performing feature extraction on the data using principal component analysis (PCA) followed by binary classification using the AdaBoost classifier. In addition, the present study also establishes the contribution of various symptoms in the diagnosis of the malady. © 2022 IEEE.

20.
1st International Conference on Computational Science and Technology, ICCST 2022 ; : 821-824, 2022.
Article in English | Scopus | ID: covidwho-2260303

ABSTRACT

Since the adoption of the internet as a medium of communication of information, fake or false information or news has always been a major issue. Incidents of false information have always increased at times of crisis on national or international scales. The world witnessed a global pandemic from the Coronavirus, causing a complete disruption in the functioning of society. News of bogus cures, home remedies, and medicines started to make their way around the world. The number of incidents of such false news only increased as the pandemic worsened and more people were falling sick and dying. In times of desperation, people can easily be persuaded to try unverified and possibly dangerous medicines or cures, that can cost them their money as well as health. In this paper, natural language processing is used to first identify and differentiate text that has information regarding Covid 19 from the text that does not contain information regarding Covid 19. Word frequency scores like TF and IDF scores are then calculated. The intent of the text is then analyzed by observing the mannerisms detected in false news. With this analysis, the potential of the text to be false or fake is then determined. This research intends to explore the linguistics of false news and to get one step ahead in identifying fake news. The same methodology can be used to analyze data related to other specific topics. © 2022 IEEE.

SELECTION OF CITATIONS
SEARCH DETAIL